notes - probability & types

event spaces
outer measure
measurable spaces
defining a measure
measure transformations
probability theory
- independence

intro

What follows — as with many of my posts — is my explanation of work from someone who is more erudite than me.

This time it is "Formalizing the Beginnings of Bayesian Probability Theory in the Lean Theorem Prover" a Master's thesis from the lovely Rishikesh Vaishnav. You might find it here, ceteris paribus.

I skipped his intro on types — the foundation of Lean's formal proofs, so read that if you like. As you may know, I enjoy Dependent Types at Work by Ana Bove and Peter Dybjer (but this uses a different language — Agda). That being said, I will also disclude all of Vaishnav's code. So, this will read more like HoTT in the line of Marin-Lof's or Hofmann's or Voevodsky's or Awodey's publications.

Vaishnav's organization follows:

Measure-Theoretic Foundations
The Event Space
The Outer Measure
- Outer Measure Transformations
Measurable Spaces
- Transforming Measurable Spaces
- A Galois Connection
- The Trimmed Outer Measure
- Measurable Spaces on Pi Types
Defining a Measure
Measure Transformations
- From a Partial Function
- The Carathéodory Criterion
- Lifting an Outer Measure Transformation
- Restricting a Measure
- Mapping a Measure
Probability Theory
- Independence
- Conditional Probability
- Random Variables

Now, Vaishnav's details mostly fall under "Probability Theory". He also numbered everything, but that's a bit much for me (I use my own numbering here). Right now, I am taking a probability class, not a measure theory class, so I should probably focus on the "Probability Theory" section myself.

The list above is somewhat decieving, as Vasishnav included sub-sub-headings for the "Probability" section splitting "Independence" and "Conditional" into subsections for definitions and properties. But more importantly, "Random Variables" includes:

Random Variables
- The Joint Distribution
- Restricting a Pi
- The Marginal Distribution
- Marginalization
- Independence on Random Variables
- Conditional Independence on Random Variables

That brings me to the fact of the matter. I took an "Engineering Stats" class and forgot everything. I still do not know what any of this means:

Random variable: A variable that takes on different values based on the outcome of a random event. \(X\)
Distribution: The function that describes the probabilities of different outcomes or values of a random variable. \(P(X = x)\)
Mean: The average value of a set of numbers or a random variable. \(\mu\)
Variance: A measure of the spread or dispersion of a set of numbers or a random variable. \(\sigma^2\)
Standard deviation: The square root of the variance, providing a measure of the average distance between each data point and the mean. \(\sigma\)
Normal distribution: A symmetric probability distribution that is commonly used to model real-world phenomena. \(N(\mu, \sigma^2)\)
Hypothesis testing: A statistical method to make inferences or conclusions about a population based on sample data.
Confidence interval: A range of values that is likely to contain the true value of a population parameter with a certain level of confidence. \([a, b]\)

(quoting nobody, I just prefer the blockquote markup to using the pl-5 bootstrap; you can add that to the izak lore if you want)

event spaces

Let a type \( \alpha \) be the "fundamental event space", where each trial yields a value. To assign a measure to sets in \( \alpha \), representing probabilities: use a random variable, denoted \( X: \alpha \rightarrow \beta \), a transformation on the event space.

With a measure function on \( \alpha \), we can define a measure function on \( \beta \) corresponding to \( X \). For any set \( s \) in \( \beta \), the measure of \( s \) is the measure of the preimage of \( s \) under \( X \) (denoted as \( X^{-1}(s) \)). This relationship between \(\alpha\) and \(\beta\) comes to fruition during the "Mapping/Co-Mapping Measurable Spaces" part of Vaishnav's "Outer Measure" section.

Restrictions on a measure function \(m\) come from the " outer measure", which states that, for a function \(m : \text{set } \alpha \rightarrow \mathbb{R}^+_\infty\) to be an outer measure \((\) where \(\mathbb{R}^+_\infty\) represents the non-negative real numbers including infinity\()\), we have the empty set property \((1)\), monotonicity property \((2)\), and countable union property \((3)\):

\[ \begin{align*} m(\emptyset) &= 0, \quad & (1) \\ \forall A, B : \text{set } \alpha, A \subseteq B & \Rightarrow m(A) \leq m(B), \quad & (2) \\ \forall f : \mathbb{N} \rightarrow \text{set } \alpha, m\left(\bigcup_{i=0}^\infty f(i)\right) & \leq \sum_{i=0}^\infty m(f(i)), \quad & (3) \end{align*} \]

The empty set property \((1)\) ensures that the empty set has zero mass. The monotonicity property \((2)\) guarantees that a set has at least as much mass as any of its subsets. The countable union property \((3)\) limits the mass of a set to the maximum mass present in any covering set of subsets.

These propertis of outer measure allow us to talk about measure using inequalities.

See Vaishnav's paper for info on "why countable".

outer measure

Vaishnav defines:

\[ \text{fromfunc}(m_0(s)) = \inf_{ f : \mathbb{N} \rightarrow \text{set }\alpha} \left\{s \subseteq \bigcup_{i=0}^{\infty} f(i) \Rightarrow \sum_{i=0}^{\infty} m_0(f(i)) \right\} \]

Satisfying: \(∀ s : set \ α, m(s) ≤ m_0(s).\)

In the type theory style that verges on pseudocode, we have \(\text{fromfunc}(m_0(s))\) — which Vaishnav denotes \(\text{fromfunc}(m_0)(s)\).

The condition \(s \subseteq \bigcup_{i=0}^{\infty} f(i)\) ensures that the sets \(f(i)\) cover the set \(s\). If this condition is not satisfied for some \(f\), the value is effectively considered \(\infty\).

The function \(\text{fromfunc}(m_0)\) is then defined as a valid outer measure, which satisfies the condition \(m(s) \leq m_0(s)\) for all sets \(s\). It is also proven that \(\text{fromfunc}(m_0)\) is the unique maximum outer measure that satisfies this condition. No other outer measure can exceed it on any set.

Vaishnav also formalizes this for a more general case of partial functions where \(m_0 : S → \mathbb{R}^{+\infty}\) such that:

\[ \text{extend} (m_0) (s) = \begin{cases} m_0 (s) \quad s \in S \\ \infty \quad s \notin S \end{cases} \]

Now is the moment you've been waiting for since event spaces.

Given a function \(f: \alpha \to \beta\) and an outer measure \(m_0\) on \(\alpha\), we define \(\text{map}_f(m_0)\) as:

\[ \text{map}_f(m_0)(s) = m_0(f^{-1}(s)) \]

Similarly, for an outer measure \(m_1\) on \(\beta\), we define \(\text{comap}_f(m_1)\) as:

\[ \text{comap}_f(m_1)(s) = m_1(f(s)) \]

Both \(\text{map}_f(m_0)\) and \(\text{comap}_f(m_1)\) are outer measures.

Given an outer measure \(m_0\) on \(\alpha\), we can restrict it to a set \(t\) by defining:

\[ \text{restrict}_t(m_0)(s) = m_0(s \cap t) \]

This measures the intersection of \(s\) with \(t\). The function \(\text{restrict}_t(m_0)\) is proven to be an outer measure. In Lean, we can use the coercion function \(c_t: t \to \alpha\) to avoid the explicit proof. We can express it as:

\[ \text{restrict}_t(m_0) = \text{map}_{c_t}(\text{comap}_{c_t}(m_0)) \]

Alternatively, for any \(s\), it can be simplified to:

\[ \text{restrict}_t(m_0)(s) = m_0(c_t(c_t^{-1}(s))) \]

where \(c_t(c_t^{-1}(s))\) represents the intersection \(s \cap t\). Since \(\text{map}(\cdot)\) and \(\text{comap}(\cdot)\) preserve outer measures, we can conclude that \(\text{restrict}_t(m_0)(s)\) is an outer measure.

Now, that was a very close paraphase of Vaishnav's text.

Alternatively, the beloved Wikipedia says:

Given a set \(X\), let \(2^X\) denote the collection of all subsets of \(X\), including the empty set \(\emptyset\).

An outer measure on \(X\) is a set function \(\mu: 2^X \to [0, \infty]\) such that:

Null empty set: \(\mu(\emptyset) = 0\)
Countably subadditive: For arbitrary subsets \(A, B_1, B_2, \ldots\) of \(X\), if \(A \subseteq \bigcup_{j=1}^\infty B_j\), then \(\mu(A) \leq \sum_{j=1}^\infty \mu(B_j)\).

measurable spaces

Outer measures provide information about inequalities in measure functions. We are also interested in equalities between measures of sets in the event space. For disjoint sets \(A\) and \(B\) in \(\alpha\), we want to express the equality \(m(A \cup B) = m(A) + m(B)\). In the general case where \(A\) and \(B\) are not necessarily disjoint, we seek the equality \(m(A \cup B) = m(A \setminus B) + m(A \cap B) + m(B \setminus A)\).

To control equality restrictions on measures, we introduce measurable spaces (\(σ\)-algebras). A measurable space \(M: \text{set}(\text{set} \ \alpha)\) includes "measurable sets" and satisfies:

Empty set measurability: \(\emptyset \in M\)
Complement closure: For any set \(s\) in \(\alpha\), if \(s \in M\), then its complement \(s^c \in M\)
Countable union closure: For any sequence \(f: \mathbb{N} \to \text{set} \ \alpha\), if \(f(i) \in M\) for all \(i\) in \(\mathbb{N}\), then \(\bigcup\limits_{i=0}^\infty f(i) \in M\)

From these properties, we can derive other intuitive closures, including those of countable intersection:

\[ \forall f : \mathbb{N} \to \text{set} \ \alpha, \forall i : \mathbb{N}, f(i) \in M \Rightarrow \bigcap_{i=0}^\infty f(i) \in M, \]

and difference:

\[ \forall a,b : \text{set} \ \alpha, \ a \in M \Rightarrow b \in M \Rightarrow a \setminus b \in M. \]

Now, how does Vaishnav define measure transformations?

Given \(S : \text{set}(\text{set} \ \alpha)\), the measurable space "closure" of \(S\) is the smallest measurable space that includes every set in \(S\). The generated measurable space can be visualized as the "mosaic" of set intersections and differences formed by \(S\) and the universal set ("univ") in our event space. The set of measurable sets consists of all possible countable unions of the "tiles" in this mosaic.

Now, let \(C : \text{set} \ \alpha \rightarrow \text{Prop}\) be a predicate. To show that \(C\) holds for all sets in \(\text{generate}(S)\), we can prove that it holds for \(S\) itself \((1)\), the empty set \((2)\), under complement \((3)\), and under countable union \((4)\).

\((1) \quad \forall s \in S\), \(C(s) \quad (1) \)

\((2) \quad C(\emptyset) \quad (2) \)

\((3) \quad \forall s \in \text{generate}(S)\), \(C(s) \Rightarrow C(s^c)\)

\((4) \quad \forall f : \mathbb{N} \to \text{set} \ \alpha, \quad \left(\forall i : \mathbb{N}, \ f(i) \in \text{generate}(S) \land C(f(i)\right) \Rightarrow C(\bigcup_{i=0}^{\infty} f(i))\)

The Encyclopedia of Mathematics describes this more mathematically as:

Let \((X,A)\) and \((Y,B)\) be measurable spaces.

A map \(f:X\rightarrow Y\) is called measurable if \(f^{-1}(B)\in A\) for every \(B\in B\).

These two measurable spaces are called isomorphic if there exists a bijection \(f:X\rightarrow Y\) such that \(f\) and \(f^{-1}\) are measurable (such \(f\) is called an isomorphism).

Let \(X\) be a set, \((Y,B)\) a measurable space, and \((f_i)_{i\in I}\) a family of maps \(f_i:X\rightarrow Y\). The \(\sigma\)-algebra generated by these maps is defined as the smallest \(\sigma\)-algebra \(A\) on \(X\) such that all \(f_i\) are measurable from \((X,A)\) to \((Y,B)\). More generally, one may take measurable spaces \((Y_i,B_i)\) and maps \(f_i:X\rightarrow Y_i\). On the other hand, if \(Y\) is \(\mathbb{R}\) (or \(\mathbb{C}\), \(\mathbb{R}^n\), etc.) then \(B\) is by default the Borel \(\sigma\)-algebra.

But, even this description has a very topological tone. Note: a Borel set is simply formed by open sets "through the operations of countable union, countable intersection, and relative complement" ( Wikipedia: Borel sets) — this also follows transfinite induction, etc, but that's outside our scope.

This props us up for a more intuitive (while less rigorous) picture:

Definition 4
Let \(F\) be a collection of subsets of \(\Omega\) such that:
Closed under complements: If \(A \in F\), then \(A^C \in F\).
Closed under countable unions: If \(A_1, A_2, \ldots\) are all in \(F\), then \(\bigcup_{i=1}^\infty A_i \in F\).
We call \(F\) a \(\sigma\)-algebra, and the elements of \(F\) measurable sets.
( Huber, Probability: Lectures and Labs)

No mention of topology's maps, Borel, & generating or type theory's ... predicates ... (I guess the type theory is not too bad yet).

That being said, Huber does clarify:

Whenever Ω is a finite set, F will be the set of all subsets of Ω. When Ω = R, F is typically taken to be the Borel sets. In this first course in probability we will not define the Borel sets precisely, except to note that any interval (open or closed, finite or infinite) is a member of the Borel sets.

Vaishnav also looks at "Combining Measurable Spaces":

Given a set \(S\) of measurable spaces, we can now define their "combination" into the smallest measurable space that contains both of them (i.e. their supremum) as
\[ \text{sup}(S) = \text{generate}\left(\bigcup_{s \in S} s\right), \]
the measurable space generation on the union (set supremum) of all of their sets of measurable sets.

Again, Vaishnav considers "Mapping/Co-Mapping Measurable Spaces":

As for outer measures, a map/comap on \((M, \alpha)\) with respect to \(s\) is defined given a second type \(\beta\), a function \(f : \alpha \to \beta\), and a measurable space \((M, \alpha)\) on \(\alpha\). The map is the set of all sets \(\beta\) such that their preimage is a measurable set:
\[ \text{map}_f((M, \alpha)) = \{ b : \text{set} \ \beta \mid f^{-1}(b) \in (M, \alpha) \} \]
Given a measurable space \((M, \beta)\) on \(\beta\), we define our comap to be the set of all sets \(\alpha\) such that they are the preimage of some measurable set:
\[ \text{comap}_f((M, \beta)) = \{ a : \text{set} \ \alpha \mid \exists b : \text{set} \ \beta, f^{-1}(b) = a \} \]
(you can also think of this as simply taking the preimages of every set in \((M, \beta)\)).

From here, the lovely Vaishnav mentions:

In mathematical notation, the definition can be written as follows:

A Galois connection is a pair of functions \(l: \alpha \to \beta\) and \(u: \beta \to \alpha\) between preordered sets \(\alpha\) and \(\beta\) that satisfy the following property:

\[ \forall a \in \alpha, b \in \beta, \quad l(a) \leq b \iff a \leq u(b) \]

Here, \(\leq\) represents the order relation in the respective preordered sets \(\alpha\) and \(\beta\). The functions \(l\) and \(u\) establish a connection between the orders in \(\alpha\) and \(\beta\) such that if \(l(a)\) is less than or equal to \(b\), then \(a\) is less than or equal to \(u(b)\), and vice versa.

Vaishnav explains the effect of this, but gives the lemmas in Lean, so here is the property of the lemmas NOT in Lean!

In mathematical notation, an essential property of a Galois connection is that an upper/lower adjoint of a Galois connection uniquely determines the other:

\[ \begin{align*} F(a) & \text{ is the least element } \tilde{b} \text{ with } a \leq G(\tilde{b}), \\ G(b) & \text{ is the largest element } \tilde{a} \text{ with } F(\tilde{a}) \leq b. \end{align*} \]

Source

But, as Vaishnav explains:

In our case, we can show that, for any \(f: \alpha \to \beta\), there is a Galois Connection between \(\text{comap}_f(\cdot)\) and \(\text{map}_f(\cdot)\) (where the \(\leq\) relation on measurable spaces is the subset relation). To see this, consider an arbitrary measurable space \(M_\beta\) on \(\beta\) and \(M_\alpha\) on \(\alpha\). Here, \(\text{comap}_f(M_\beta) \subseteq M_\alpha\) means that every set in \(M_\beta\) has a preimage in \(M_\alpha\), which is exactly the same statement as \(M_\beta \subseteq \text{map}_f(M_\alpha)\).
The Galois Connection gives us the fact that, for any list of measurable spaces \(M_{\beta_i}\) on \(\beta\) indexed by \(i\) in \(\iota\), we have that the \(\text{comap}_f\) of the indexed supremum of \(M_{\beta_i}\) is the indexed supremum of the \(\text{comap}_f\) of each measurable space (regardless of whether the index type is countable):
\[ \text{comap}_f \left( \sup_{i \in \iota} M_{\beta_i} \right) = \sup_{i \in \iota} \left( \text{comap}_f (M_{\beta_i}) \right) \]

Vaishnav has even more to say about trimmed outer measure:

Suppose we have a measurable space \( M \) on \( \alpha \) and a partial set function \( m_0 \) defined on \( M \), satisfying the three properties of an outer measure restricted to sets from \( M \). We can define the outer measure induce\( (m_0) \) as fromfunc( extend\( (m_0) \)). Interestingly, with this function, we have the property:
\[ \forall s \in \text{set } \alpha, \text{induce}(m_0)(s) = \inf_{t \in \text{set} \alpha} s \subseteq t \land t \in M \Rightarrow m_0(t) \]
In other words, the measurable sets retain their values from the partial function due to the monotonicity property, and each non-measurable set is assigned the infimum of the values of all its measurable supersets.
\([...]\)
As a matter of fact, there exists a measurable superset that exactly matches this infimum:
\[ \forall s \in \text{set } \alpha, \exists t \in \text{set } \alpha, s \subseteq t \land t \in \mathcal{M} \land \text{induce}(m_0)(s) = m_0(t) \]
The proof of this relies on constructing a list \(f: \mathbb{N} \to \text{set } \alpha\) of measurable sets that "narrows down" to \(\text{induce}(m_0)(s)\) in the sense that:
\[ \forall i \in \mathbb{N}, m_0(f(i)) < \text{induce}(m_0)(s) + i - 1 \]
Now, given an arbitrary outer measure \(m\) and a measurable set \(M\), we can check whether \(m\) was constructed as above using the notion of a "trimmed" outer measure. Let \(\text{sub}_M(m)\) be \(m\) restricted to the domain \(M\). Defining:
\[ \text{trim}_M(m) = \text{induce}(\text{sub}_M(m)) \]
We can now check whether \(\text{trim}_M(m) = m\), i.e., whether \(\text{trim}_M\) assigns the same values to non-measurable sets as \(m\) originally had.

Finally, Vaishnav touches on Pi-types, which are a unique perspective on measure theory coming from type theory.

Here is a definition I like of Pi-types:

Given a dependent type \(x : C \vdash D(x)\), the Pi-type \(\Pi x : C. D(x)\) can be thought of as the set of all functions \(f\) from \(C\) to \(\prod_{x : C} D(x)\) such that \(f(x) \in D(x)\) for all \(x \in C\). These functions are called dependent maps from \(C\) to \(D\). In set theory, we interpret simple types \(C\) as sets \(C\) and terms \(x : C\) as elements \(x \in C\). Dependent types \(D\) over \(C\) are interpreted as families \(D = (D(x))_{x \in C}\) of sets \(D(x)\).

Source

Suppose we have an index of types \(\beta : \iota \rightarrow \text{Type}\) and a mapping \(M\) that assigns a measurable space to each type. We want to define a measurable space on the function space \(\prod_{i:\iota} \beta(i)\) that corresponds to \(M\).

To accomplish this, we define a "function evaluation" function \(eval(i)\) that extracts the value at a specific index. Given a function \(f : \prod_{i:\iota} \beta(i)\), \(eval(i)(f)\) gives us the value of \(f\) at index \(i\).

Next, we consider the preimage of a set \(s\) in type \(\beta(i)\) under the function \(eval(i)\). This preimage consists of all functions whose value at index \(i\) lies in the set \(s\).

By applying the comap operation to \(eval(i)\) with respect to \(M(i)\), we obtain the measurable sets in \(\beta(i)\) that correspond to these preimages. We repeat this process for all indices \(i\) and combine the resulting measurable spaces using the supremum operation. This gives us the definition of the "pi measurable space":

\[\Pi \text{-measurable-space}(M) = \sup_{i:\iota} \left( \text{comap}_{\text{eval}(i)}(M(i)) \right)\]

This is not too important to measure-theoretic probability theory?

defining a measure

We now have all the definitions we need to define an outer measure that corresponds to the concept of probability, allowing us to derive useful equalities based on a measurable space. For any \( f : \mathbb{N} \to \text{set } \alpha \), let \( PD(f) \) indicate that \( f \) is pairwise disjoint, meaning the sets that \( f \) maps to with any distinct pair of indices are disjoint. Given a measurable space \( M \), we define a "measure" on \( M \) as an outer measure \( m \) that additionally satisfies the properties of disjoint measurable set countable union and trim invariance:
\[ \forall f : \mathbb{N} \to \text{set } \alpha, (\forall i : \mathbb{N}, f(i) \in M) \land (PD(f)) \Rightarrow \]
\[\quad m\left(\bigcup_{i=0}^\infty f(i)\right) = \sum_{i=0}^{\infty} m(f(i)), \]
\[ \text{trim}_M(m) = m\]
The disjoint measurable set countable union property ensures that this measure corresponds to the notion of probability, as we know it, where the measure of the union of any list of disjoint measurable sets can be expressed as the sum of their individual measures.

As Vaishnav explains, the final criterion for an outer measure to correspond to a notion of probability is that it assigns a measure of 1 to the entire event space. A measure on the event space \(\alpha\) satisfying \(\mu(\text{univ} : \text{set } \alpha) = 1\) is referred to as a "probability measure". In general, probability measures are a type of "finite measure" that satisfies \(\mu(\text{univ}) < \infty\).

measure transformations

Measures themselves are a type of outer measure that satisfies additional criteria. To achieve this, we "lift" outer measure transformations to the context of measures.

Two specific transformations we focus on are \(restrict(·)\) and \(map(·)\), as defined earlier. However, directly applying these transformations to a measure may not always result in an outer measure that satisfies the criteria to be considered a measure. To address this, we need additional transformations that preserve the behavior of the original transformation on measurable sets and ensure that the resulting outer measure remains a measure.

This, again, comes down to varifying properties of outer measure and properties of transformations on measurable spaces already stated.

Recall, for outer measure, we have the empty set property \((1)\), monotonicity property \((2)\), and countable union property \((3)\):

Recall, for transforming measurable spaces let \(C : \text{set} \ \alpha \rightarrow \text{Prop}\) be a predicate. To show that \(C\) holds for all sets in \(\text{generate}(S)\), we can prove that it holds for \(S\) itself \((1)\), the empty set \((2)\), under complement \((3)\), and under countable union \((4)\).

\[(1) \quad \forall s \in S, \ C(s) \quad (1) \]

\[(2) \quad C(\emptyset) \quad (2) \]

\[(3) \quad \forall s \in \text{generate}(S), \ C(s) \Rightarrow C(s^c)\]

\[ (4) \quad \forall f : \mathbb{N} \to \text{set} \ \alpha, \quad ( \forall i : \mathbb{N}, \ f(i) \in \text{generate}(S) \land C(f(i) ) \Rightarrow C(\bigcup_{i=0}^{\infty} f(i)) \]

But, most importantly, we must have disjoint measurable set countable union:

\[ \forall f : \mathbb{N} \to \text{set } \alpha, (\forall i : \mathbb{N}, f(i) \in M) \land (PD(f)) \Rightarrow \]

\[\quad m\left(\bigcup_{i=0}^\infty f(i)\right) = \sum_{i=0}^{\infty} m(f(i)), \]

And, the Caratheodory Criterion is sufficient for to prove disjoint measurable set countable union.

Given a set function \(f : \text{set} \alpha \rightarrow \mathbb{R}^+\cup\{\infty\}\) and a set \(s\), \(s\) is referred to as "Carathéodory" (written as \(\text{cara } f(s)\)) if \(\forall t : \text{set} \alpha, f(t) = f(t \cap s) + f(t \setminus s)\), which means that the measure of any set \(t\) can be decomposed into the measure of its intersection with \(s\) and the measure of its difference with \(s\). Now, consider an outer measure \(m\) and measurable space \(M\). It can be shown that the set of all Carathéodory sets forms a measurable space.

At this point, relies on higher levels of type-theoretic abstractions. The more elementary treatments thus far have been alright, with support from topology every now and then. Still, for those interested in a higher level type-theoretic persepctive, check out Vaishnav's subsections on "Lifting an Outer Measure Transformation", or "Restricting a Measure", or "Mapping a Measure". These topics can also be treated with topology outside the scope of these notes.

probability theory

independence

Given a measure \(\mu\) on \(\alpha\) and two events \(a\) and \(b\) from our event space, we define independence by requiring that the measure of their intersection is the product of their measures:

\[ \mu(a \cap b) = \mu(a) \mu(b) \]

However, we can extend this notion of independence to sets of events. We define independence between sets of events \(A\) and \(B\) under \(\mu\) as follows:

\(A\) and \(B\) are independent under \(\mu\) if for any pair of events \(a \in A\) and \(b \in B\), the measure of their intersection is the product of their measures:

\[ \text{indep_sets}_\mu(A, B) \Leftrightarrow \forall a \in A, b \in B, \mu(a \cap b) = \mu(a) \mu(b) \]

This basically says that: two measurable spaces are independent with respect to their measures if their sets-of-events (which are, again, sets of sets) are independent with respect to the measures. This does not need math markup, sorry Vaishnav.

Given two sets \(a,b : \text{set } \alpha\), we define independence between them under \(\mu\) as:

\[ \text{indep_set}_\mu(a,b) \Leftrightarrow \text{indep}_\mu(\text{generate}(\{a\}), \text{generate}(\{b\})), \]

which is the independence of their generated measurable spaces. Note that for any set \(s : \text{set } \alpha\), \(\text{generate}(\{s\}) = \{0,\emptyset,s,s^c,\alpha\}\) exactly. It may be immediately obvious that this definition is equivalent to \(\mu(a \cap b) = \mu(a) \mu(b)\), but we will soon show that this fact arises from a more general result regarding independence on the measurable spaces generated by a pair of independent "\(π\)-systems".

Now, that was Vaishnav's idea of definitions, so here are the Vaishnav's idea of properties.

A "Dynkin system," also known as a "\(λ\)-system," is a generalization of measurable spaces that enables an induction principles. We define a Dynkin system as a set of sets \(M : \text{set } (\text{set } \alpha)\) with the properties of \(1\) empty set measurability, \(2\) complement closure, and \(3\) disjoint countable union closure:
\(\emptyset \in M \quad (1) \)
\(\forall s : \text{set } \alpha, s \in M \Rightarrow s^c \in M \quad (2) \)
\(\forall f : \mathbb{N} \rightarrow \text{set } \alpha, \forall i : \mathbb{N}, f(i) \in M \land \text{PD}(f) \Rightarrow \bigcup_{i=0}^{\infty} f(i) \in M \quad (3) \)
The disjointedness condition in \(1\) is the only difference from the definition of measurable spaces (see (2.7)).
Given any \(S : \text{set } (\text{set } \alpha)\), we can consider its Dynkin system "closure," which is the smallest Dynkin system including every set in \(S\). We define this closure similarly to how we did for measurable spaces with \(\text{generate}(S)\), modifying the last construction rule to union only over disjoint sets. We denote this generator as \(\text{generate}_D(S)\).
Now, suppose we have some predicate \(C : \text{set } \alpha \rightarrow \text{Prop}\). If we want to show that \(C\) holds for all sets in \(\text{generate}_D(S)\), we have to prove that \(1\) it holds for \(S\) itself, \(2\) for the empty set, \(3\) under complement, and \(4\) under disjoint countable union:
\(\forall s \in S, C(s)\quad (1) \)
\(C(\emptyset) \quad (2) \)
\(\forall s \in \text{generate}_D(S), C(s) \Rightarrow C(s^c) \quad (3) \)
\(\forall f : \mathbb{N} \rightarrow \text{set } \alpha, \left( \forall i : \mathbb{N}, f(i) \in \text{generate}_D(S) \land C(f(i)) \right) \land \text{PD}(f) \Rightarrow C \left( \bigcup_{i=0}^{\infty} f(i) \right) \quad (4) \)

As Vaishnav explains that Dynkin systems have an important property: closure under intersection implies measurability. The proof relies on the ability to derive difference closure from intersection closure, enabling the decomposition of sets into a disjoint list with the same union. Additionally, the set \(\{t \mid t \cap s \in D\}\), where \(D\) is a Dynkin system and \(s \in D\), is also a Dynkin system.

To determine the applicability of the extra disjointedness condition, we introduce \(\pi\)-systems. These systems establish the equivalence between generated measurable spaces and generated Dynkin systems, allowing us to leverage \(\pi\)-system properties in specific situations.

Vaishnav then describes Dykin's \(\pi\)-\(\lambda\) theorem.

We define the "\(π\)-system" criterion for a set \(S : \text{set } (\text{set } \alpha)\) as:
\[PI(S) \Leftrightarrow \forall s, t \in S, s \cap t \neq \emptyset \Rightarrow s \cap t \in S,\]
which simply requires that \(S\) is closed under non-empty intersection. The inclusion of the non-empty requirement allows for greater generality, although it is not necessary for our results. It is immediately evident that singleton sets satisfy the \(π\)-system criterion:
\[\forall s : \text{set } \alpha, PI(\{s\})\]
Remarkably, if \(PI(S)\), then the generated Dynkin system, \(\text{generate}_D(S)\), is closed under intersection, making it a measurable space. Furthermore, we have \(\text{generate}_D(S) = \text{generate}(S)\), indicating that the generated measurable space and Dynkin systems are identical. This result is known as Dynkin's \(π\)-\(λ\) theorem.
We now have the criterion needed for a new induction principle on generated measurable spaces: If they are generated by a \(π\)-system, we can replace "countable union for generating measuable spaces" (recall:)
\[\forall f : \mathbb{N} \rightarrow \text{set } \alpha, \left( \forall i : \mathbb{N}, f(i) \in \text{generate}_D(S) \land C(f(i)) \right) \land \text{PD}(f) \Rightarrow C \left( \bigcup_{i=0}^{\infty} f(i) \right)\]
with "countable union for generating \(π\)-systems":
\[(\forall i : \mathbb{N}, \ f(i) \in generate(S) \land C(f(i))) \land PD(f) \Rightarrow C\left(\bigcup_{i=0}^\infty f(i)\right),\]
where the new hypothesis, \(PD(f)\), assists in the proof.
Now, Vaishnav offer's an axiomatic defintiion of independence:
With the help of "countable union for generating \(π\)-systems" we can prove that,
\[ \forall S, T: \operatorname{set}(\operatorname{set} \alpha), \operatorname{PI}(S) \wedge \operatorname{PI}(T) \Longrightarrow{\operatorname{indep} \_\operatorname{set}_\mu}_\mu(S, T) \Longleftrightarrow \operatorname{indep}_\mu(\operatorname{generate}(S), \operatorname{generate}(T)) \]
in English, the independence of \(\pi\)-systems implies the independence of the measurable spaces they generate. We can now simplify our criterion for independence of sets: from ("the independence of \(\pi\)-systems implies the independence of the measurable spaces they generate"), and the fact that singletons are \(\pi\)-systems, we have
\[\forall s, t : \text{set } \alpha, \text{indep_set}_\mu (s, t) \Leftrightarrow \text{indep_sets}_\mu (\{s\}, \{t\}) \Leftrightarrow \mu(s \cap t) = \mu(s) \mu(t)\]
So, we now have license to use the "bundled" \(\operatorname{indep}_{-} \operatorname{set}_\mu(s, t)\) in place of our classical definition of independence (again, the equivalence can easily be seen by cases, but it's good to know the general result).

The End up

At this point, I do not find the remaining formalities conducive to my personal study of probabiity theory, although they might interest me more if I was more familiar with Lean.

I have continued more relevant discussion of measure theory in the context of intro to probability at my recent post notes - cont. measure & intro prob.

Check out the rest of Vaishnav's paper for (again, you might find it here, ceteris paribus):

conditional probability
random variables
- the joint distribution
- restricting a pi
- the marginal distribution
- marginalization
- independence on random variables
- conditional independence on random variables

table of contents